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Abstract: The complex structure and strong heterogeneity of advanced nuclear reactor systems pose 
challenges for high-fidelity neutron-shielding calculations. Unstructured meshes exhibit strong 
geometric adaptability and can overcome the deficiencies of conventionally structured meshes in 
complex geometry modeling. A multithreaded parallel upwind sweep algorithm for SN transport was 
proposed to achieve a more accurate geometric description and improve the computational efficiency. 
The spatial variables were discretized using the standard discontinuous Galerkin finite-element 
method (DGFEM). The angular flux transmission between neighboring meshes was handled using an 
upwind scheme. In addition, a combination of a mesh transport sweep and angular iterations was 
realized using a multithreaded parallel technique. The algorithm was implemented in the 2D/3D Sw 
transport code ThorSNIPE, and numerical evaluations were conducted using three typical benchmark 
problems: IAEA, Kobayashi-3i, and VENUS-3. These numerical results indicate that the 
multithreaded parallel upwind sweep algorithm can achieve high computational efficiency. 
ThorSNIPE, with a multithreaded parallel upwind sweep algorithm, has good reliability, stability, and 
high efficiency, making it suitable for complex shielding calculations. 
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1 Introduction 


High-fidelity neutron-shielding calculations are an essential part of advanced nuclear reactor system design. 
The discrete ordinate (Sj) method is one of the most popular deterministic particle transport methods owing to its 
high angular resolution [1]. Reliable modeling and simulation of Sy transport depend on two key essential conditions 
and fundamental components: reasonable spatial discretization schemes and mesh distributions with sufficient 
modeling fidelity. The two key factors interacted with and supported each other throughout the process. 

Conventional Cartesian structured grids are widely used and implemented because of their regular arrangement 
and friendliness to finite-difference (FD) spatial discretization schemes [2-4]. However, there is no general robust 
meshing algorithm for acquiring exact descriptions of complex structures for realistic modeling and simulations [5]. 
Adaptive mesh refinement techniques retain the original mesh distribution features and mark cells with large errors 
for refinement [6]. Local error estimation techniques and geometric data structures play significant roles in 
determining the stabilities and computational efficiencies of these systems. Unstructured grids use piecewise flat or 
curved surfaces to provide complicated geometries with the highest possible flexibility. With the development of the 
finite element method (FEM), unstructured grids have been widely applied to reactor physical analyses [7], 
multiphysics simulations [8], and thermal-fluid dynamics [9]. Combining finite element numerical methods and the 
SN method with unstructured meshes has become a trend in nuclear shielding design. 

Unfortunately, the first-order SN equation cannot be solved stably using the continuous FEM [10]. However, 


while the solutions to this hyperbolic equation are smooth along the streamlines, they may be discontinuous 


perpendicular to the chosen direction. To obtain a robust solution, several Petrov-Galerkin (PG) approaches that 
update the test function as the sum of the trial functions and introduce an additional stabilization term [11-13]. It was 
observed that the free stabilization parameter affects the convergence of the Sy equation and raises uncertainty 
because it is usually considered as a matter of experience. Additionally, PGs cannot generally produce symmetric 
positive-definite coefficient matrices or inherit the best approximation property when the transport equations contain 
nonself-adjoint operators. 

The discontinuous Galerkin finite element method (DGFEM) is a different category of stability improvement. 
This was pioneered by Reed and Hill for neutron transport problems in the early 1970s [14]. The DGFEM connects 
adjacent elements through the incoming interface fluxes from the upwind (upstream) meshes and applies the usual 
FEM procedure to a single mesh. However, the number of unknowns in the DGFEM is much larger than that in the 
continuous FEMs, which demands a matrix assembly and solution method. The standard matrix assembly procedure 
looks for a reliable iterative solver for a banded sparse system matrix solution and must handle the storage brought 
on by the increased freedom [15]. The matrix-free strategy stores the diagonal blocks and realizes the inversion of 
the transport operator by means of the matrix-vector product; however, a sum factorization technique to enable 
translation between vector entries and values or gradients in quadrature points is required [16]. Another commonly 
used technique is the conventional sweeping procedure, such as FDs, in which parallelization of the sweeping 
procedure is implemented to reduce the computing time, which may become important in a large SN order. 

In terms of parallelization, complex grid connection relationships, and irregular element sweep orders pose 
challenges to the algorithms. In 2000, Plimpton described an asynchronous, parallel message-passing algorithm that 
simultaneously performs sweeps from many directions across unstructured grids [17]. In 2002, Pautz used a low- 
complexity list-ordering heuristic to determine sweep ordering on a partitioned mesh [18]. The essence of these two 
algorithms is the conversion of the abstract element sweep order into a directed acyclic graph (DAG). Since 2010, 
several research institutions have conducted extensive studies on high-performance architectural applications [19, 
20]. 

A multigroup 2D SN transport code, ThorSNIPE, with unstructured meshes was developed and validated [21]. 
This code was extended to 3D SN transport calculations and employed a multithreaded parallel upwind sweep 
algorithm to achieve a more accurate description and improve the reliability and efficiency of transport calculations 
in complex geometries. The finite element method was implemented using an open-source finite element library 
deal. 1-9.4.0 [22]. The deal.II is an open-source C++ program library targeted at the computational solution of partial 
differential equations using adaptive finite elements and focuses on generality, dimension-independent programming, 
parallelism, and extensibility. It includes many state-of-the-art programming techniques for solving partial 
differential equations, linear algebra problems, and computer science strategies and offers users a modern interface 
for complex data structures and algorithms. With the assistance of this library, ThorSNIPE can be made dimension- 
independent, supporting different mesh sizes and types of finite elements. 

The remainder of this paper is organized as follows. In Section 2, the main theory and methodology of the 
ThorSNIPE code are described in detail, including the Sy weak-form equation based on DGFEM theory and the 
multithreaded parallel upwind sweep algorithm. The numerical results are presented and discussed in Section 3. The 
conclusions are summarized in Section 4. 


2 Theory and methodology 


2.1 Weak form of the discrete ordinates method equations 


The Sy form was obtained by solving the Boltzmann transport equation along discrete directions and replacing 
the integrals over the unit sphere with quadrature sets. The steady-state multi-group Sy neutron transport equation 
can be written as 


Q, Vuin+tnyi(r)=O%(r), g =1,...,G; m=1,...,M;r E€ D (1) 


Where D is the domain. y‘(r) is the angular flux in group g , direction Q, and position r. ys is the 
macroscopic total cross-section in group g. Q*(r) is a source term in group g, direction Q, and position r. 
For fixed source problems, it can be expressed as: 
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and for eigenvalue problems, it can be expressed as: 


sA S 2+] eog ak k Xe Se g (3) 
Q5 (r) dd, re OX (r)Y¥‘(Q,,) + ree ya (EE (r)b,.(7) 
M 
6,7) => owi) (4) 
m=1 


Where G and M is total groups and angles. y:>s(;) is the nth-order Legendre moment of scattering cross 
section from group g' to group g at position r. SE Cr) is the macroscopic fission cross section in group g' at 


position r. v! (r) is the average number of neutrons produced per fission in group g'. k „ is the eigenvalue. 


ø: (r) represents kth, nth-order flux moment at position r . Y (Q) is kth, nth-order spherical harmonic in 
direction Qat position r. g(r) is fixed-source in group g at position r. wA (r) is scalar flux in group g at 
position r. @,, is weight factor of quadrature set. 

Eq. (1) is a linear hyperbolic equation: It is necessary to transform this "strong form" differential-integral 
equation into "weak form" for solving. The spatial domain D is decomposed into K elements, which can be 
unstructured because the FEM is independent of the space dimension and grid choice. Consider V, as the surface 
of the kthelement and y, as the volume of the kth element, where k = 1,..., K . To obtain the weak form, Eqs. 
(1) is multiplied by an arbitrary test function y,* (r) and integrated into any element k . Eq. (1) can be rewritten 
in monoenergetic form as 


J, Qn Vn POV + [EO OAV =| 0, ry n.dV (5) 


The divergence theorem is then applied to Eq. (5), yielding: 
Jou, Or WW DAS = f, Yn IQ, Vy (AY + 
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Where n(r) is the unit normal vector to the surface of cell ov, at position r. Inner products on the volume and 
surface of any element are introduced to express the bilinear form [23]. 
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Substituting Eq. (7) and (8) into (6), 
(Vin Wn) gy, Vn nV Vin )y, + ZV n Vn y, = (Qn Ym y (9) 


The general boundary can be divided into three types based on its relative position: interior boundary, outflow 
domain boundary (əvi ={redV,,Q-n(r) > 0}) , and inflow domain boundary (ov, ={r eOV,,Q-n(r) < 0}) > 
The inflow-domain boundary conditions were prescribed as either vacuum or reflective for the incoming directions. 
So, Eq. (9) can be rewritten as 
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where wt, y,, and y” are the outflow angular, inflow angular, and interior face fluxes, respectively. Eq. (10) 
represents the weak form of the Sy equation. 


2.2 The discontinuous Galerkin finite element discretization scheme 


The Galerkin method is theoretically based on the weighted residual method, which multiplies the original 
partial differential equation by a test function. Seeking a numerical approximation of the angular flux term using 


Lagrange polynomials of degree P 
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where b,(r) is the Lagrange shape function, and @,, jis an unsolved coefficient. The matrix form is convenient 
for solving Eq. (10), which expresses @,, , and b F (r) in row vector notation, and Eq. (11) can be written as 
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In Galerkin discretization schemes, test functions share similar function space as basis functions, 
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Substituting Eq. (12) and (13) into (10), 
(b.b) y; Pn — (D7, Vb], Pn + (ZBB), Pn 


in 


m 
sh (wi cd Wm o 


a4 
= (On Wn y, Wa) 


Əv 
The upwind scheme is the essence of the DGFEM. Notably, although the test functions on the edges are always 

taken from within the element, the primal functions on the edges are taken upwind with respect to the streaming 

direction. For outgoing edges, the primal functions are taken from within the element itself, whereas for incoming 

edges, the primal functions are taken from the upwind neighbor element. 

2.3 Multithreaded parallel upwind sweep algorithm 

2.3.1 Cell-based sweep algorithm 

In the Sn method, a transport sweep was performed for all elements along each discrete direction. The ordering 
in which the elements and degrees of freedom (DOF) traverse affects the speed of convergence. As mentioned in 
Section 1, the DGFEM makes it possible to use a cell-based sweeping procedure in Sy methods rather than 
assembling a global matrix. 

The element sweep order for each discrete direction in the ThorSNIPE code was developed using an upwind 
scheme based on the deal.II library. In our case, an array with all relevant cells is created and sorted in the upwind 
direction using a "dealii :: DoFRenumbering :: CompareDownstream" object in the deal. FEM library. A 
straightforward 2D model with three material regions and various sweep schemes is shown in Fig. 1. Cells in a 
conventional scheme are often categorized from edge to center to make it easier to facilitate the creation of a global 
matrix. The downwind system performs renumbering in the direction opposite to the discrete direction, whereas the 
upwind scheme manages renumbering alongside it. The definitions of "upwind" and "downwind" are determined by 
multiplying the unit normal vector to the cell's surface by the cosines of the corresponding discrete direction. In 
reality, the mesh location, discrete direction, and sweep-define the order of each element. The sweep solution 
technique also complies with neutron transport laws. Not all pathways are mutually exclusive, and each distinct 
direction results in a unique order in which this solution technique can be applied. 

From the perspective of solving the equations, various sweep schemes also affect the calculation of the 
incoming angular fluxes y} in Eq. (14). Primal functions obtained from neighboring elements require numerical 
values with a certain precision. The calculation error was gradually reduced when the transport equation was solved 
along the direction of particle motion using the upwind scheme. However, other schemes must process 
nondirectional data, adapt, and learn based on the data received. This process can be laborious and error-prone, and 
the accumulation of errors inevitably affects the robustness of the numerical calculation method. Thus, the entire 
solution procedure required more iterations. 

Algorithm 1 displays the cell-based sweep algorithm for each group and angle once the sweep order has been 
determined. The local equation can be divided into two parts as follows: The transport matrix first makes 


contributions from the streaming and removal operators determined by the fixed mass matrix and total cross-section. 


Second, the right-hand side (RHS) vector includes contributions from scattering source operations and fixed source 
or fission source operators. The in-group and cross-group couplings comprised the contributions of the scattering 
sources. Lastly, a "dealii :: types :: global dof index" object translates the local solution to the global angular solution. 
This cell-based sweep technique delivers the DGFEM solution to all cells in a single sweep order without coupling 
integral terms, fully utilizing the key benefits of DGFEM approaches. 


Algorithm 1: cell-based sweep for each group and angle 
Input: g, m, Gao 
for upwind sorted cell do 
for DOF of cell i do 
for DOF of cell ; do 


transport _matrix(i, j) = -f b'(r)Q,, "Vb (r)dV + f 2, (r)b; (b; (r)dV 
for moment n do 


for moment k do 
for DOF of cell i do 


source _ rhs(i)+ = i b;(r) (z, (i (r) + 1 fixed o) (Q,,)dV 


for faces on cell do 
if outflow face then 
for DOF of face i do 
for DOF of face ; do 
boundary _ matrix(i, j) = f. Q,’ n(r)b,(ryb; (r)dS 
else if inflow face then 
for DOF of face 7 do 
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cell _ rhs(i) = source _ rhs(i) + boundary _ rhs(i) 


cell _ matrix(i, j) = transport _matrix(i, j)+ boundary _ matrix(i, j) 
solution(i) = cell _matrix(i, j) ‘cell _ rhs(i) 


boundary _ rhs(i) = 


for DOF of cell i do 
angular _ solution{m](local _ dof _ indices{i]) = solution(i) 
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Figure 1 (a) Material region distribution; (b) Conventional scheme; (c) Upwind scheme; (d) Downwind scheme 


The following two steps were employed to compute the local contributions of an individual cell and RHS: First, 
the integral is transformed from an actual cell & into a unit/reference cell k . For example, the streaming operator 
is transformed into 


I, b'(r)Q,,Vo(r)dV = f [OEA |] 2, VEC) ||det J®|av (15) 


where the hat indicates the reference coordinates, and Jç) is the Jacobian of the mapping r = F, (r) . Second, this 
integral is approximated using the Gauss-Legendre quadrature. This yields the formula 


f, 2° 2, VoMAV =f [IGN Fa) || Q,J EVC |ldet Cow, (19) 


where q is the index of the quadrature point, | is the location of the reference cell, and w, is the Gaussian 


quadrature weight. In this study, triangular and tetrahedral grids were used for the 2D and 3D cases, respectively. 


2.3.2 Multithread parallel strategy 


Parallel transport computation has gained popularity in the nuclear field because of the emergence of high- 
performance computing clusters [24]. How the processors schedule parallel jobs significantly affects the 
performance of code using parallel computing. The two basic techniques are spatial-domain decomposition (SDD) 
[25] and angular-domain decomposition (ADD) [26]. SDD techniques subdivide the spatial domain into subdomains, 
such that the common mesh-sweep method is performed concurrently in numerous subdomains at diverse angles. 
However, parallelizing the Sy spatial sweep is more challenging because of the upwind-downwind dependencies 
between the domain boundaries. 

ADD techniques are frequently considered more expandable and require less memory [27]. They entail a mesh 
sweep along a similar quadrant and are the same for all angles in the chosen quadrature [28]. All operations are 
statically scheduled to the participating processors without negatively affecting the load balance and are ideally 
suited for personal computers (PCs). In shared memory machines, the traditional approach to parallelism is to break 
code down into threads. The deal.II leverages the Threading Building Blocks (TBB) library [29] as basic wrappers 
to build an object called "tasks.” TBB offers transferable interfaces across many different platforms and abstracts 
the specifics of run-state signals onto individual threads. 
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Figure 2 Flowchart depicting multithreaded parallel strategy 

The entire procedure for the multithreaded parallel approach in the ThorSNIPE code is shown in Fig. 2. Both 
inner and outer iterations contain an angular loop. The number of CPU cores determines the total number of tasks in 
a PC system. The angular loops were divided into a specific number of subranges that were evenly distributed across 
the threads. A cell-based sweep sequence is used for each angular loop. A subrange is a task. The tasks are essentially 
independent during the iterations. They performed the calculations using the same mesh and equation structure as 
the respective quadrature set. However, the storage of angular fluxes is based on the local DOF indices calculated 
using sweep schemes. The calculation results for different subranges were collected to update the scattering source. 
Following the execution of the subranges, threads keep themselves busy by stealing complete portions of the 
subranges from other threads if they have finished their work. Thus, the code can fully exploit the system thread 
resource, which requires frequent interruptions by the operating system to allow other threads to execute on the 


available processor cores. 


3 Results and discussions 


The multithreaded parallel upwind sweep algorithm is validated using three typical benchmark problems. The 
IAEA benchmark with five different regions was used to test the performances of the different sweep schemes. The 
Kobayashi-3i benchmark with cavities and bend ducts was adopted to evaluate the performance of the parallel 
computation for different mesh sizes and angular quadrate sets. The VENUS-3 benchmark was used to validate the 
algorithm’s capabilities for realistic transport calculations. In all benchmarks, convergence criteria are 10° for the 


eigenvalue and 10~ for the average flux in each space interval. 
3.1 IAEA benchmark 


The IAEA benchmark is a 2D five-region calculation problem consisting of two large source zones and two 
large absorber zones surrounded by light water [30] and is designed for IAEA advanced reactor neutron transport 
computation. There is a strong nonuniformity in the fuel region, as shown in Fig. 3. The cross-sections are given in 
Table 1, and all boundary conditions are vacuum. 
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Figure 3 Geometry of the IAEA 5-region fixed problem 


Table 1 Cross-section of the IAEA 5-region fixed problem 


Region Elem t) Zf(cm-t) £ (cm1) s(cm?s~!) 
R1 0.60 0.079 0.53 1.0 
R2 0.48 0.0 0.20 0.0 
R3 0.70 0.0 0.66 1.0 
R4 0.65 0.043 0.50 0.0 
R5 0.90 0.0 0.83 0.0 


This benchmark problem is mainly used to analyze the effect of the element sweep order on the cell-based 
algorithm efficiency and assert the superiority of the upwind sweep scheme. Under the PyTy-Sg angular 
approximation, comparisons of the numerical results for the fixed source and eigenvalue cases simulated with 2338 
elements are presented in Tables 2 and 3, respectively. The relative deviations in the average flux were within 2.50% 
in the fixed-source case and within 5.00% in the eigenvalue case. All the numerical results show excellent agreement 
with the references, demonstrating the accuracy and validity of the algorithm and modeling approach for such 
strongly heterogeneous problems. The change in the element sweep order does not make a large difference to the 
results. This is because the transport sweep operations were performed individually on each element, and the results 
were added. Because matrix addition is commutative and these operators in the equation can be described as applying 
a sum of matrices, the orders of various components will not impact the final result. 

Table 2 Comparison of average flux of the IAEA benchmark 
Average flux (Relative Errors) cm?s-1 


Reference , : : 
Conventional Downwind Upwind 
R1 11.941 11.872 (0.58%) 11.874 (0.56%) 11.871 (0.59%) 
R2 0.545 0.547 (0.37%) 0.547 (0.37%) 0.546 (0.18%) 
R3 19.168 19.045 (0.64%) 19.051 (0.61%) 19.039 (0.67%) 
R4 0.843 0.838 (0.60%) 0.840 (0.36%) 0.837 (0.71%) 
R5 1.530 1.542 (0.78%) 1.553 (1.50%) 1.531 (0.07%) 


Table 3 Comparison of average flux and kerr of the IAEA benchmark 


Reference 


Average flux (Relative Errors) cm7?s! 


Conventional Downwind Upwind 
R1 0.01686 0.016371 (2.90%) 0.01661 (1.18%) 0.016643 (1.29%) 
R2 0.000125 0.000131 (4.80%) 0.000128 (2.40%) 0.000128 (2.40%) 
R3 0.000041 0.000040 (2.44%) 0.000041 (0.00%) 0.000041 (0.00%) 
R4 0.000295 0.000295 (0.00%) 0.000293 (0.68%) 0.000293 (0.68%) 
R5 0.000791 0.000799 (1.01%) 0.000788 (0.38%) 0.000789 (0.25%) 
keff 1.0083 1.007076 1.008371 1.008305 
pcm -12 7 1 


As shown in Fig. 4, the number of iterations was compared with the change in the number of elements in the 


fixed-source calculation. The numerical results indicated that the upwind sweep scheme had a stable convergence 


speed, regardless of the number of elements. The relationship between the number of iterations and elements grows 


exponentially in both types of sweep schemes. The primary functions on the edges are determined based on the 


upwind edges with regard to the streaming direction, and transport problems involve a directional flow of 


information. The upwind scheme fully uses the initial boundary conditions and provides spatial flux moments for 


the neighboring elements. The other sweep schemes increased the number of source and power iterations required 


to meet the convergence criteria in the subsequent matrix calculations to compensate for the lack of constraints. In 


addition, no overhead cost of assembling and solving the stiffness matrix in the current framework was observed for 


the three sweep schemes mentioned above. Therefore, the upwind sweep algorithm can save significant 


computational costs with fewer iterations and is suitable for DGFEM simulations. 
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Figure 4 Number of iterations with the change in the number of elements for different sweep schemes 


3.2 Kobayashi-3i benchmark 


The Kobayashi-3i benchmark [31] is a bend duct-type problem with void regions in a highly absorbing medium, 


as shown in Fig. 5. The cross sections used in this benchmark problem are listed in Table 4. This problem was set 


up to analyze the features of spatial and angular discretization and demonstrate the parallel efficiency of the 


multithreaded strategy. 
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Figure 5 Geometry of the Kobayashi-3i benchmark 


Table 4 Cross-section of the Kobayashi-3i benchmark 


Region Elem’) £ (cm t!) s(cm™>?s-1) 
R1 0.1 0.05 1.0 
R2 0.0001 0.00005 0.0 
R3 0.1 0.05 0.0 


Angular flux distribution over full domain with mesh sizes of 5 cm, 4 cm, 3 cm, 2.5 cm, and 2 cm are modeled 
and simulated. The quality of the quadrature sets was quantified by taking the root mean square (RMS) of the relative 
scalar flux of all N point detectors. 


(17) 


where the reference solutions ¢,,- were obtained using an analytical method [31]. The RMS of the relative 
errors in the scalar flux using different quadrature sets versus the number of ordinates is presented in Fig. 6a. It 
should be noted that the various approximations in the DGFEM with different mesh sizes converged at variable 
speeds and started to flatten as the number of ordinates increased under S2 angular approximations. 

The ThorSNIPE code runs on a PC with 8 cores / 16 threads (one 11th Gen Intel(R) Core(TM) i7-11700F CPU 
running at 2.5 GHz) to demonstrate the performance and gain of the multithread technology, t. The speedup ratio 
with respect to serial mode versus the order of quadrature sets is presented in Fig. 6b, where the parallel 
implementation efficiency exceeds 100% when the S12 angular approximations are attained, and the mesh size is 
less than 3 cm. The comparative results show that the proposed method can achieve a better acceleration effect as 
the number of meshes increases. It can also be observed that the speedup ratio increases more slowly when the S20 
angular approximations are attained. In this case, each thread has over 30 parallel tasks, and memory access becomes 
a bottleneck in the matrix solvers for partial differential equations. The numerical results are shown in Fig. 7, and 
the neutron fluxes at the key points are listed in Table 5. Compared with MCNP [31], the ThorSNIPE code effectively 
reduces angular discretization errors and fulfills the transport computational requirement to a certain degree. 
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Figure 6 (a) Root mean square of Scalar flux L2-error versus the order of quadrature sets for the Kobayashi-3i 
benchmark; (b) Speedup ratio versus the order of quadrature sets for the Kobayashi-31 benchmark 
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Table 5 Total flux comparison in the Kobayashi-31 benchmark 


Figure 7 Mesh distribution and scalar flux of the Kobayashi-3i benchmark 
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Case Co-ordinates MCNP (cms!) = ThorSNIPE cms"! Relative errors 

3A (5,5,5) 8.61578x10° 8.61432x10° -0.02% 
(5,15,5) 2.16130x10° 2.13602x10° -1.17% 
(5,25,5) 8.93784x 10"! 8.93551x10! -0.03 
(5,35,5) 4.78052x10"! 4.72878x10"! -1.08% 
(5,45,5) 2.89424x 107! 2.87933x10"! -0.52% 
(5,55,5) 1.92698x10"! 1.91669x10"! -0.53% 
(5,65,5) 1.04982x10"! 1.04268x10"! -0.68% 
(5,75,5) 3.37544x102 3.43534x102 1.77% 
(5,85,5) 1.08158x 107 1.08608x 102 0.42% 
(5,95,5) 3.39632x 10% 3.47318x10% 2.26% 


3B (5,55,5) 1.92698x10"! 1.91669x10" -0.53% 
(15,55,5) 6.72147x107 6.74001x107 0.28% 
(25,55,5) 2.21799x 107 2.20850x 107 -0.43% 
(35,55,5) 9.90646x 107 9.74980x 10% -1.58% 
(45,55,5) 3.39066x 10% 3.43623x107 1.34% 
(55,55,5) 1.05629x107 1.04113x10% -1.44% 
3C (5,95,5) 3.44804 107 3.45115x107 0.09% 
(15,95,5) 2.91825x10+ 2.98723x10+ 2.36% 
(25,95,5) 2.05793x107 2.07437x107 0.80% 
(35,95,5) 2.62086x107 2.63330x107 0.47% 
(45,95,5) 1.05367x104 1.07620x 104 2.14% 
(55,95,5) 4.44962x10° 4.46921x10° 0.44% 


3.3 VENUS-3 benchmark 


The VENUS-3 benchmark was proposed to validate the capabilities of the calculation methodologies and cross- 
section libraries for predicting fluence rates in RPVs because it has a very clean structural geometry representing 
standard PWR pressure vessel conditions [32]. Fig. 8 illustrates the geometric configuration and material distribution 
of the VENUS-3 facility. The VENUS-3 model includes two types of fuel rods: the PLSA, core baffle, water reflector, 
barrel, Pyrex control rods, barrel, neutron pad, three types of grids, bottom support, reactor vessel, and jacket inner 
walls. Detailed descriptions and qualifications were obtained from the benchmark document. The benchmark 
arranges 386 dosimeters, with 244 *8Ni(n, p)°*Co dosimeters, 104 !!5In(n, n’)!'>™In dosimeters, and 38 ’Al(n, a)*4Na 
dosimeters placed in 268 different spatial locations. The experimental rate sets are provided as a ratio of the 


calculated reaction rate to the average dosimeter cross-section, named the equivalent fission flux. 
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Figure 8 Geometry of the Venus-3 benchmark 
The numerical results shown in Fig. 9 were modeled and simulated using the PyTy-Sg angular discretization, 
199 KASHIL-E70 neutron groups multigroup cross-sections [33], and 407819 tetrahedral elements. Fig. 10 shows a 
C/E comparison of the equivalent fission fluxes at the three detector positions. For the 'In(n, n’)!>™In dosimeters, 
almost all the equivalent fission flux deviations are included within +/-10%, except for a detector position (No. 31 


located in the core barrel with 0.7° angle) where the equivalent fission flux numerical value is affected by reflective 


boundary. For the 7’Al(n, a)*4Na dosimeters, 92% of the equivalent fission flux deviations were within +/-10%, and 
the remaining three dosimeters overestimated the corresponding experimental values by approximately 11.9%. For 
the **Ni(n, p)*8Co dosimeters, 95% of the equivalent fission flux deviations are limited within +/-10%, except for 11 
detector positions where the equivalent fission flux numerical values overestimated the corresponding experimental 
values by about 10.1%-20.7%. They are concentrated at the junction of the PLSA region, outer baffle, and core 
barrel. The scalar flux distribution is not smooth because of the strong spatial nonuniformity of the geometry model, 
and higher requirements are proposed for local mesh refinement, which significantly degrades the simulated 
accuracy of these regions. The overall agreement between the numerical results from the ThorSNIPE code and the 
experimental results was very good, which shows that the ThorSNIPE code is highly reliable and stable for complex 


shielding calculations. 
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Figure 9 Mesh distribution and fast neutron fluxes of the VENUS-3 
The numerical results [32] from TORT-3.2 with 1004295 hexahedral grids and 26 BUGLE-96 neutron energy 
groups were selected as comparative references. For the uniformly distributed 3.3% fuel regions, TORT-3.2 with 
hexahedral structured meshes achieved better results. The ThorSNIPE code with tetrahedral unstructured meshes 
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can decrease the numerical oscillation for complex regions. For the *’Al(n, a)**Na dosimeters, the numerical results 
of ThorSNIPE tended to be lower than those of TORT-3.2 in the core barrel. This underestimation might have been 
caused by an error in the average dosimeter cross section. 

In Fig. 11, the speedup ratios of the three quadrature sets with the P3 Legendre expansion order and the four 
Legendre expansion orders with PyTy-Sg angular discretization are shown. The test conditions were consistent with 
TORT-3.2 code. Overall, the behavior of the proposed algorithm was similar to that of the Kobayashi-3i benchmark. 
For PyTw-Si2, the acceleration ratio was 6.47 with 407819 elements. Comparing the results of different expansion 
orders, the Ps expansion order has the steepest upward trend, and the maximum speedup ratio achieved was 6.98. 
The numerical results indicate that the proposed algorithm achieves a favorable speedup ratio as the scale of the 


mesh increases, as expected. 
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Figure 10 C/E comparison of equivalent fission fluxes at detector positions: (a) *$Ni(n,p)>°Co from No. 1 to 139; 


(b) *8Ni(n,p)°°Co from No. 140 to 244; (c) !*In(n,n’)!!5™In; (d) 77Al(n,a)*4Na 
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Figure 11 Speedup ratio with the change in the (a) order of quadrature sets; (b) order of Legendre expansion 


4 Conclusions 


In this study, a multithreaded parallel upwind algorithm to solve the first-order Boltzmann neutron equation is 
proposed and implemented in the multigroup 2D/3D Sy transport code ThorSNIPE. The cell-based upwind sweep 
algorithm realizes angular flux transmission between neighboring meshes and achieves a stable solution by applying 
DGFEM spatial discretization. The multithread parallel strategy automatically manages the angular sweep stack of 
each thread and improves the quality and efficiency of the ThorSNIPE code. 

The performance and accuracy of the proposed algorithm were tested using several typical benchmark problems. 
In the IAEA benchmark, the upwind sweep algorithm was proven to be the optimal method for DGFEM calculations 
and can save significant computational costs. The results show that the ThorSNIPE code agrees well with the 
reference, with a deviation of 1 pcm in eigenvalue and within 2.50% for flux distribution. In the Kobayashi-3i 
benchmark, the results for angular flux simulations over the full domain in different mesh sizes and quadrature set 
orders indicate that the multithreaded parallel algorithm has potential extensibility. In the VENUS-3 benchmark, 96% 
of the equivalent fission flux deviations are limited within +/-10%, except for 15 detector positions where the 
equivalent fission flux numerical values overestimated the corresponding experimental values by about 10.1%- 
20.7%. The maximum speedup ratio was 6.98 with 407819 elements. The results of the VENUS-3 benchmark show 
that this robust algorithm can play a reliable role in practical engineering applications and improve computational 
efficiency. 

In the future, angular and algebraic multigrid methods planned to solve discretized equations to overcome the 
reduced convergence speed of the iteration method will be investigated. Furthermore, we plan to address more 
complex problems to assess the accuracy and effectiveness of the proposed method and present a rigorous 
mathematical convergence analysis of the new scheme. 
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